From Predicting Predominant Senses to Local Context for Word Sense Disambiguation
نویسندگان
چکیده
Recent work on automatically predicting the predominant sense of a word has proven to be promising (McCarthy et al., 2004). It can be applied (as a first sense heuristic) to Word Sense Disambiguation (WSD) tasks, without needing expensive hand-annotated data sets. Due to the big skew in the sense distribution of many words (Yarowsky and Florian, 2002), the First Sense heuristic for WSD is often hard to beat. However, the local context of an ambiguous word can give important clues to which of its senses was intended. The sense ranking method proposed by (McCarthy et al., 2004) uses a distributional similarity thesaurus. The k nearest neighbours in the thesaurus are used to establish the predominant sense of a word. In this paper we report on a first investigation on how to use the grammatical relations the target word is involved with, in order to select a subset of the neighbours from the automatically created thesaurus, to take the local context into account. This unsupervised method is quantitatively evaluated on SemCor. We found a slight improvement in precision over using the predicted first sense. Finally, we discuss strengths and weaknesses of the method and suggest ways to improve the results in the future.
منابع مشابه
Finding Predominant Word Senses in Untagged Text
In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of handtagged data. Whilst there are a few hand-ta...
متن کاملUsing automatically acquired predominant senses for Word Sense Disambiguation
In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The first (or predominant) sense heuristic assumes the availability of handtagged data. Whilst there are hand-tagged corpora available for some languages, these are relatively small in size and many word forms either do not o...
متن کاملAn Eigenvalue-Based Measure for Word-Sense Disambiguation
Current approaches for word-sense disambiguation (WSD) try to relate the senses of the target words by optimizing a score for each sense in the context of all other words’ senses. However, by scoring each sense separately, they often fail to optimize the relations between the resulting senses. We address this problem by proposing a HITS-inspired method that attempts to optimize the score for th...
متن کاملText Categorization for Improved Priors of Word Meaning
Distributions of the senses of words are often highly skewed. This fact is exploited by word sense disambiguation (WSD) systems which back off to the predominant (most frequent) sense of a word when contextual clues are not strong enough. The topic domain of a document has a strong influence on the sense distribution of words. Unfortunately, it is not feasible to produce large manually sense-an...
متن کاملA Practical Solution to the Problem of Automatic Word Sense Induction
Recent studies in word sense induction are based on clustering global co-occurrence vectors, i.e. vectors that reflect the overall behavior of a word in a corpus. If a word is semantically ambiguous, this means that these vectors are mixtures of all its senses. Inducing a word’s senses therefore involves the difficult problem of recovering the sense vectors from the mixtures. In this paper we a...
متن کامل